Encode, Review, and Decode: Reviewer Module for Caption Generation

نویسندگان

  • Zhilin Yang
  • Ye Yuan
  • Yuexin Wu
  • Ruslan Salakhutdinov
  • William W. Cohen
چکیده

We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoderdecoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework improves over state-ofthe-art encoder-decoder systems on the tasks of image captioning and source code captioning.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Patch-Based Digital Signature

In this paper a new patch-based digital signature (DS) is proposed. The proposed approach similar to steganography methods hides the secure message in a host image. However, it uses a patch-based key to encode/decode the data like cryptography approaches. Both the host image and key patches are randomly initialized. The proposed approach consists of encoding and decoding algorithms. The encodin...

متن کامل

Phrase-based Image Captioning with Hierarchical LSTM Model

Automatic generation of caption to describe the content of an image has been gaining a lot of research interests recently, where most of the existing works treat the image caption as pure sequential data. Natural language, however possess a temporal hierarchy structure, with complex dependencies between each subsequence. In this paper, we propose a phrasebased hierarchical Long Short-Term Memor...

متن کامل

ISIA at the ImageCLEF 2017 Image Caption Task

This paper describes the details of our methods for participation in the caption prediction task of ImageCLEF 2017. The dataset we use is all provided by the organizers and doesn’t include any external resources. The key components of our framework include a deep model part, an SVM part and a caption retrieval part. In deep model part, we use an end to end architecture with Convolutional neural...

متن کامل

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images, which is a much more abundant commodity. We here propose a novel way of using such textual data by artificially generating missing visual information. We eva...

متن کامل

Diversity driven attention model for query-based abstractive summarization

Abstractive summarization aims to generate a shorter version of the document covering all the salient points in a compact and coherent fashion. On the other hand, query-based summarization highlights those points that are relevant in the context of a given query. The encodeattend-decode paradigm has achieved notable success in machine translation, extractive summarization, dialog systems, etc. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1605.07912  شماره 

صفحات  -

تاریخ انتشار 2016